Here we look at the Metacognition subscales (Wells et al.; Packet 3) in detail.
Differences across fieldsites
Mean scores by site
First, let’s look at scores for participants in each site:

Note that this plot includes both average scores for each site (in black), and individual scores for all of the participants in that site (small, colorful points in the background, which are “jittered” around a little so that you can see them all).
I have not looked at this in detail.
Now let’s look at these differences in more detail using the “raw data” for individual questions, rather than these subscale scores.
Responses by question, by site
Joining, by = "question"
Column `question` joining character vector and factor, coercing into character vectorIgnoring unknown aesthetics: fill

I have not yet looked at this in detail, but it looks like it’s going to be interesting!
Distribution of responses for individual participants, by site
Another thing we might be interested in is how individual participants responded: Were there people who said yes to everything, or no to everything? How do these distributions of responses differ across participants in different sites?
Let’s take a look:
Joining, by = "question"
Column `question` joining character vector and factor, coercing into character vector

I have not yet looked at this in detail.
Just for fun, here’s another way to look at the same data, overlaying the density distributions for each site on top of each other to see where they seem to be similar/different:
Joining, by = "question"
Column `question` joining character vector and factor, coercing into character vector

---
title: 'Grappling with the "Metacognition" subscales'
subtitle: 'Last updated 2018-04-08'
output:
  html_notebook: default
  html_document:
    df_print: paged
  pdf_document: default
---

```{r, include = FALSE}
knitr::opts_chunk$set(echo = FALSE, message = FALSE)
```

```{r, include = FALSE}
# set working director
# setwd("/Users/kweisman/Documents/Research (Stanford)/Projects/Templeton Grant/DATA WRANGLING/templeton_packets/packets123/")

# load packages
library(tidyverse)
library(rms)
library(ggdendro)
library(psych)

# load question key (including manual reverse-coding)
question_key <- read.csv("//Users/kweisman/Documents/Research (Stanford)/Projects/Templeton Grant/DATA WRANGLING/templeton_packets/packets123/packets123_question_key_byhand.csv")

# load data (reverse-coded)
d_long <- read_csv("//Users/kweisman/Documents/Research (Stanford)/Projects/Templeton Grant/DATA WRANGLING/templeton_packets/packets123/packets123_data_byquestion_long.csv") %>%
  mutate(ctry = factor(ctry, 
                       levels = c("us", "ghana", "thailand", "china", "vanuatu")))
d_long_subscale <- read_csv("//Users/kweisman/Documents/Research (Stanford)/Projects/Templeton Grant/DATA WRANGLING/templeton_packets/packets123/packets123_data_bysubscale_long.csv") %>%
  mutate(ctry = factor(ctry, 
                       levels = c("us", "ghana", "thailand", "china", "vanuatu")))

# load data (before reverse-coding)
d_all <- read.csv("//Users/kweisman/Documents/Research (Stanford)/Projects/Templeton Grant/DATA WRANGLING/templeton_packets/packets123/packets123_data.csv")

# make custom functions
round2 <- function(x) {format(round(x, 2), digits = 2)}
```

Here we look at the Metacognition subscales (Wells et al.; Packet 3) in detail.

# Differences across fieldsites

## Mean scores by site

First, let's look at scores for participants in each site:

```{r, include = F}
d_long_subscale_boot <- d_long_subscale %>%
  filter(!is.na(sum_score)) %>%
  group_by(ctry, packet, subscale) %>%
  do(data.frame(rbind(smean.cl.boot(.$sum_score)))) %>%
  ungroup() %>%
  filter(subscale != "attn") %>%
  left_join(d_long_subscale %>%
              filter(!is.na(sum_score)) %>%
              count(ctry, packet, subscale)) %>%
  mutate(packet = paste("packet", packet),
         ctry = factor(ctry,
                       levels = c("us", "ghana", 
                                  "thailand", "china", "vanuatu")),
         subscale = 
           factor(subscale,
                  levels = c("exwl", 
                             "exwl_extra",
                             "dse_01to14", 
                             "dse_15to16", 
                             "spev", 
                             "sen_sensory_seeking", 
                             "sen_body_awareness", 
                             "sen_trait_metamood",
                             "her2_hallucination",
                             "invo_VISQ_dialogic_speech", 
                             "invo_VISQ_inner_speech",
                             "invo_VISQ_eval_motiv_inner_speech",
                             "invo_hardy_bentall",
                             "her_posey_losch",
                             "enco_lewicki",
                             "meta_van_elk",
                             "tat_confidence",
                             "tat_positive_beliefs",
                             "tat_cognitive",
                             "tat_uncontrollability",
                             "tat_need_control",
                             "minw_mental_states",
                             "minw_life_events",
                             "minw_inanimate",
                             "minw_selves_souls_world",
                             "minw_epistemic"),
                  labels = c("absorption (tellegen)", 
                             "absorption (extra)",
                             "daily spiritual experiences (#1-14)",
                             "daily spiritual experiences (#15-16)",
                             "spiritual events", 
                             "sensory seeking",
                             "body awareness", 
                             "attention to feelings",
                             "hallucination", 
                             "VISQ: dialogic speech",
                             "VISQ: inner speech",
                             "VISQ: evaluative",
                             "inner speech",
                             "hearing events",
                             "encoding style",
                             "mind metaphors",
                             "metacog.: lack of cognitive confidence",
                             "metacog.: positive beliefs re: worrying",
                             "metacog.: cognitive self-consciousness",
                             "metacog.: uncontrollability/danger",
                             "metacog.: need to control thoughts",
                             "dualism: mental states",
                             "dualism: life events",
                             "dualism: inanimate consciousness",
                             "dualism: minds, selves, & world",
                             "dualism: epistemology")))
```

```{r, fig.width = 6, fig.asp = 0.5}
ggplot(d_long_subscale_boot %>%
         filter(grepl("metacog", subscale)) %>%
         mutate(subscale = as.character(subscale),
                subscale = gsub("metacog.: ", "", subscale)),
       aes(x = ctry,
           y = Mean)) +
  facet_grid(~ subscale) +
  geom_point(data = d_long_subscale %>%
               filter(grepl("tat_", subscale),
                      !is.na(sum_score)) %>%
               mutate(subscale = 
                        factor(subscale,
                               levels = c("tat_cognitive",
                                          "tat_confidence",
                                          "tat_need_control",
                                          "tat_positive_beliefs",
                                          "tat_uncontrollability"),
                               labels = c("cognitive self-consciousness",
                                          "lack of cognitive confidence",
                                          "need to control thoughts",
                                          "positive beliefs re: worrying",
                                          "uncontrollability/danger"))),
             aes(y = sum_score, color = ctry),
             position = position_jitter(width = 0.3, height = 0.15),
             alpha = 0.3, size = 1) +
  geom_pointrange(aes(ymin = Lower, ymax = Upper)) +
  geom_text(aes(label = paste0("(n=", n, ")"), y = Lower), 
            size = 2, nudge_x = 0.15, hjust = 0) +
  # scale_x_discrete(expand = c(0, 1)) +
  scale_y_continuous(limits = c(-12.5, 12.5), breaks = seq(-100, 100, 10)) +
  scale_color_brewer(palette = "Dark2") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1),
        legend.position = "none") +
  labs(title = "Mean metacognition scores by site (Wells et al.)",
       subtitle = "A higher score indicates more endorsements of 'metacognitive' events (range: -12 to 12)\nError bars are bootstrapped 95% confidence intervals",
       x = "Site", color = "Site",
       y = "Mean score")
```

Note that this plot includes both average scores for each site (in black), and individual scores for all of the participants in that site (small, colorful points in the background, which are "jittered" around a little so that you can see them all).

I have not looked at this in detail.

Now let's look at these differences in more detail using the "raw data" for individual questions, rather than these subscale scores.

## Responses by question, by site

```{r, fig.width = 10, fig.asp = 0.8}
d_plot <- d_long %>%
  filter(grepl("tat_", question)) %>%
  left_join(question_key %>%
              distinct(question_label_universal, 
                       question_text,
                       byhand_coding,
                       byhand_subscale) %>%
              rename(question = question_label_universal,
                     coding = byhand_coding,
                     subscale = byhand_subscale)) %>%
  distinct() %>%
  mutate(coding = factor(coding,
                         levels = c("-1", "1"),
                         labels = c("(REVERSED)", "")),
         question_text_short = gsub('(.{1,30})(\\s|$)',
                                    '\\1\n',
                                    paste0(question,
                                           ": ",
                                           substr(question_text,
                                                  start = 1, stop = 10000),
                                           # "...",
                                           coding))) %>%
  count(ctry, subscale, question_text_short, response) %>%
  spread(response, n) %>%
  mutate_at(vars(`-2`:`2`), 
            funs(replace(., is.na(.), 0))) %>%
  arrange(ctry, `-2`) %>%
  rownames_to_column("order") %>%
  mutate(order = as.numeric(order),
         agree_n = `1` + `2`,
         total_n = `-2` + `-1` + `0` + `1` + `2`) %>%
  gather(response, n, c(`-2`, `-1`, `0`, `1`, `2`)) %>%
  mutate(response = factor(response,
                           levels = c("-2", "-1", "0", "1", "2")),
         ctry = factor(ctry,
                       levels = c("us", "ghana", "thailand", 
                                  "china", "vanuatu"),
                       labels = c("US", "Ghana", "Thailand", 
                                  "China", "Vanuatu"))) %>%
  distinct()

adjust <- 15

ggplot(d_plot %>%
         mutate(response = factor(response,
                                  labels = c("Strongly disagree", "Disagree",
                                             "Neither agree nor disagree",
                                             "Agree", "Strongly agree")),
                subscale = 
                        factor(subscale,
                               levels = c("tat_cognitive",
                                          "tat_confidence",
                                          "tat_need_control",
                                          "tat_positive_beliefs",
                                          "tat_uncontrollability"),
                               labels = c("cognitive self-consciousness",
                                          "lack of cognitive confidence",
                                          "need to control thoughts",
                                          "positive beliefs re: worrying",
                                          "uncontrollability/danger"))),
       aes(x = reorder(question_text_short, desc(question_text_short)),
           # x = reorder(question_text_short, desc(order)),
           y = n, fill = response)) +
  facet_grid(subscale ~ ctry, scales = "free", space = "free") +
  geom_bar(position = position_stack(), stat = "identity", 
           color = "black", size = 0.2) +
  geom_text(data = d_plot %>%
              distinct(ctry, subscale, question_text_short, order, 
                       agree_n, total_n) %>%
              mutate(subscale = 
                        factor(subscale,
                               levels = c("tat_cognitive",
                                          "tat_confidence",
                                          "tat_need_control",
                                          "tat_positive_beliefs",
                                          "tat_uncontrollability"),
                               labels = c("cognitive self-consciousness",
                                          "lack of cognitive confidence",
                                          "need to control thoughts",
                                          "positive beliefs re: worrying",
                                          "uncontrollability/danger"))),
            aes(y = max(d_plot$total_n) + adjust, fill = NULL,
                label = paste0(round(agree_n/total_n, 2)*100, "%")), 
            size = 3, hjust = 1) +
  scale_y_continuous(limits = c(0, max(d_plot$total_n) + adjust)) +
  scale_fill_brewer(palette = "PRGn") +
  theme_bw() +
  labs(title = "Responses to 'Metacognition' (Wells et al.) scale items",
       subtitle = "% corresponds to responses of 'Agree' or 'Strongly agree' (green)",
       x = "", y = "Count of responses", fill = "Response") +
  theme(legend.position = "top",
        plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5)) +
  # theme(axis.text.x = element_text(angle = 90, hjust = 1))
  coord_flip()

rm(adjust)
```

I have not yet looked at this in detail, but it looks like it's going to be interesting!

# Distribution of responses for individual participants, by site

Another thing we might be interested in is how individual participants responded: Were there people who said yes to everything, or no to everything? How do these distributions of responses differ across participants in different sites?

Let's take a look:

```{r, fig.width = 6, fig.asp = 3}
d_long %>%
  filter(grepl("tat_", question),
         !grepl("attn", question), !is.na(response)) %>%
  left_join(question_key %>%
              distinct(question_label_universal, 
                       question_text,
                       byhand_coding,
                       byhand_subscale) %>%
              rename(question = question_label_universal,
                     coding = byhand_coding,
                     subscale = byhand_subscale)) %>%
  distinct() %>%
  mutate(response = factor(response),
         coding = factor(coding,
                         levels = c("-1", "1"),
                         labels = c("(REVERSED)", "")),
         question_text_short = gsub('(.{1,130})(\\s|$)',
                                    '\\1\n',
                                    paste0(substr(question_text,
                                                     start = 1, stop = 10000),
                                              # "...",
                                              coding))) %>%
  distinct() %>%
  count(ctry, subscale, subj, response) %>%
  spread(response, n) %>%
  mutate_at(vars(c(`-2`, `-1`, `0`, `1`, `2`)), funs(replace(., is.na(.), 0))) %>%
  mutate(total_n = `-2` + `-1` + `0` + `1` + `2`,
         prop_n2 = `-2`/total_n,
         prop_n1 = `-1`/total_n,
         prop_n0 = `0`/total_n,
         prop_p1 = `1`/total_n,
         prop_p2 = `2`/total_n,
         ctry = factor(ctry,
                       levels = c("us", "ghana", "thailand", 
                                  "china", "vanuatu"),
                       labels = c("US", "Ghana", "Thailand", 
                                  "China", "Vanuatu"))) %>%
  distinct() %>%
  select(-c(`-2`, `-1`, `0`, `1`, `2`)) %>%
  gather(response, prop, starts_with("prop")) %>%
  mutate(response = factor(response,
                           levels = c("prop_n2", "prop_n1", "prop_n0", 
                                      "prop_p1", "prop_p2"),
                           labels = c("Strongly disagree", "Disagree",
                                      "Neither...",
                                      "Agree", "Strongly agree")),
         subscale = factor(subscale,
                           levels = c("tat_cognitive",
                                      "tat_confidence",
                                      "tat_need_control",
                                      "tat_positive_beliefs",
                                      "tat_uncontrollability"),
                           labels = c("cognitive self-consciousness",
                                      "lack of cognitive confidence",
                                      "need to control thoughts",
                                      "positive beliefs re: worrying",
                                      "uncontrollability/danger"))) %>%
  ggplot(aes(x = prop, fill = ctry)) +
  facet_grid(interaction(response, subscale, sep = "\n") ~ ctry, 
             scales = "free", space = "free") +
  geom_histogram(binwidth = 1/5) +
  scale_x_continuous(breaks = seq(0, 1, 0.25)) +
  scale_y_continuous(breaks = seq(0, 300, 10)) +
  scale_fill_brewer(palette = "Dark2") +
  theme_bw() +
  labs(title = "Distributions of how many times participants endorsed each response option for the 'Metacognition' (Wells et al.) scale items",
       x = "Proportion of responses (at the individual participant level)", y = "Count of participants", fill = "Site") +
  theme(legend.position = "none",
        plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5))
```

I have not yet looked at this in detail.

Just for fun, here's another way to look at the same data, overlaying the density distributions for each site on top of each other to see where they seem to be similar/different:

```{r, fig.width = 6, fig.asp = 1}
d_long %>%
  filter(grepl("tat_", question),
         !grepl("attn", question), !is.na(response)) %>%
  left_join(question_key %>%
              distinct(question_label_universal, 
                       question_text,
                       byhand_coding,
                       byhand_subscale) %>%
              rename(question = question_label_universal,
                     coding = byhand_coding,
                     subscale = byhand_subscale)) %>%
  distinct() %>%
  mutate(response = factor(response),
         coding = factor(coding,
                         levels = c("-1", "1"),
                         labels = c("(REVERSED)", "")),
         question_text_short = gsub('(.{1,130})(\\s|$)',
                                    '\\1\n',
                                    paste0(substr(question_text,
                                                     start = 1, stop = 10000),
                                              # "...",
                                              coding))) %>%
  distinct() %>%
  count(ctry, subscale, subj, response) %>%
  spread(response, n) %>%
  mutate_at(vars(c(`-2`, `-1`, `0`, `1`, `2`)), funs(replace(., is.na(.), 0))) %>%
  mutate(total_n = `-2` + `-1` + `0` + `1` + `2`,
         prop_n2 = `-2`/total_n,
         prop_n1 = `-1`/total_n,
         prop_n0 = `0`/total_n,
         prop_p1 = `1`/total_n,
         prop_p2 = `2`/total_n,
         ctry = factor(ctry,
                       levels = c("us", "ghana", "thailand", 
                                  "china", "vanuatu"),
                       labels = c("US", "Ghana", "Thailand", 
                                  "China", "Vanuatu"))) %>%
  distinct() %>%
  select(-c(`-2`, `-1`, `0`, `1`, `2`)) %>%
  gather(response, prop, starts_with("prop")) %>%
  mutate(response = factor(response,
                           levels = c("prop_n2", "prop_n1", "prop_n0", 
                                      "prop_p1", "prop_p2"),
                           labels = c("Strongly disagree", "Disagree",
                                      "Neither agree nor disagree",
                                      "Agree", "Strongly agree")),
         subscale = factor(subscale,
                           levels = c("tat_cognitive",
                                      "tat_confidence",
                                      "tat_need_control",
                                      "tat_positive_beliefs",
                                      "tat_uncontrollability"),
                           labels = c("cognitive self-consciousness",
                                      "lack of cognitive confidence",
                                      "need to control thoughts",
                                      "positive beliefs re: worrying",
                                      "uncontrollability/danger"))) %>%
  ggplot(aes(x = prop, fill = ctry)) +
  facet_grid(response ~ subscale, 
             scales = "free", space = "fixed") +
  geom_density(alpha = 0.3) +
  scale_x_continuous(breaks = seq(0, 1, 0.25)) +
  scale_y_continuous(breaks = seq(0, 300, 10)) +
  scale_fill_brewer(palette = "Dark2") +
  theme_bw() +
  labs(title = "Distributions of how many times participants endorsed each\nresponse option for the 'Metacognition' (Wells et al.) scale items",
       x = "Proportion of responses (at the individual participant level)", y = "Count of participants", fill = "Site") +
  theme(legend.position = "top",
        plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5))
```
